Search CORE

177 research outputs found

Machine learning approaches in computational linguistics : introduction

Author: Hinrichs Erhard
Kübler Sandra
Publication venue
Publication date: 21/10/2008
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

From chunks to function-argument structure : a similarity-based approach

Author: Hinrichs Erhard
Kübler Sandra
Publication venue
Publication date: 01/01/2001
Field of study

Chunk parsing has focused on the recognition of partial constituent structures at the level of individual chunks. Little attention has been paid to the question of how such partial analyses can be combined into larger structures for complete utterances. Such larger structures are not only desirable for a deeper syntactic analysis. They also constitute a necessary prerequisite for assigning function-argument structure. The present paper offers a similaritybased algorithm for assigning functional labels such as subject, object, head, complement, etc. to complete syntactic structures on the basis of prechunked input. The evaluation of the algorithm has concentrated on measuring the quality of functional labels. It was performed on a German and an English treebank using two different annotation schemes at the level of function argument structure. The results of 89.73% correct functional labels for German and 90.40%for English validate the general approach

CiteSeerX

Crossref

Publikationsserver der Universität Tübingen

Hochschulschriftenserver - Universität Frankfurt am Main

TüSBL : a similarity-based chunk parser for robust syntactic processing

Author: Hinrichs Erhard
Kübler Sandra
Publication venue
Publication date: 01/01/2001
Field of study

Chunk parsing has focused on the recognition of partial constituent structures at the level of individual chunks. Little attention has been paid to the question of how such partial analyses can be combined into larger structures for complete utterances. The TüSBL parser extends current chunk parsing techniques by a tree-construction component that extends partial chunk parses to complete tree structures including recursive phrase structure as well as function-argument structure. TüSBLs tree construction algorithm relies on techniques from memory-based learning that allow similarity-based classification of a given input structure relative to a pre-stored set of tree instances from a fully annotated treebank. A quantitative evaluation of TüSBL has been conducted using a semi-automatically constructed treebank of German that consists of appr. 67,000 fully annotated sentences. The basic PARSEVAL measures were used although they were developed for parsers that have as their main goal a complete analysis that spans the entire input.This runs counter to the basic philosophy underlying TüSBL, which has as its main goal robustness of partially analyzed structures

CiteSeerX

Hochschulschriftenserver - Universität Frankfurt am Main

Treebank profiling of spoken and written German

Author: Hinrichs Erhard
Kübler Sandra
Publication venue
Publication date: 01/01/2005
Field of study

This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogs, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper ´die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres

Hochschulschriftenserver - Universität Frankfurt am Main

What linguists always wanted to know about german and did not know how to estimate

Author: Hinrichs Erhard
Kübler Sandra
Publication venue
Publication date: 01/01/2006
Field of study

This paper profiles significant differences in syntactic distribution and differences in word class frequencies for two treebanks of spoken and written German: the TüBa-D/S, a treebank of transliterated spontaneous dialogues, and the TüBa-D/Z treebank of newspaper articles published in the German daily newspaper die tageszeitung´(taz). The approach can be used more generally as a means of distinguishing and classifying language corpora of different genres

Hochschulschriftenserver - Universität Frankfurt am Main

A unified representation for morphological, syntactic, semantic, and referential annotations

Author: Hinrichs Erhard
Kübler Sandra
Naumann Karin
Publication venue
Publication date: 01/01/2004
Field of study

This paper reports on the SYN-RA (SYNtax-based Reference Annotation) project, an on-going project of annotating German newspaper texts with referential relations. The project has developed an inventory of anaphoric and coreference relations for German in the context of a unified, XML-based annotation scheme for combining morphological, syntactic, semantic, and anaphoric information. The paper discusses how this unified annotation scheme relates to other formats currently discussed in the literature, in particular the annotation graph model of Bird and Liberman (2001) and the pie-in-thesky scheme for semantic annotation

Crossref

Hochschulschriftenserver - Universität Frankfurt am Main

The Tüba-D/Z treebank : annotating German with a context-free backbone

Author: Hinrichs Erhard
Kübler Sandra
Telljohann Heike
Publication venue
Publication date: 01/01/2004
Field of study

The purpose of this paper is to describe the TüBa-D/Z treebank of written German and to compare it to the independently developed TIGER treebank (Brants et al., 2002). Both treebanks, TIGER and TüBa-D/Z, use an annotation framework that is based on phrase structure grammar and that is enhanced by a level of predicate-argument structure. The comparison between the annotation schemes of the two treebanks focuses on the different treatments of free word order and discontinuous constituents in German as well as on differences in phrase-internal annotation

CiteSeerX

Hochschulschriftenserver - Universität Frankfurt am Main

A New Approach to Feature Instantiation in GPSG

Author: Hinrichs Erhard W.
Publication venue: Ohio State University. Department of Linguistics
Publication date: 01/07/1985
Field of study

KnowledgeBank at OSU

Robust Syntactic Annotation of Corpora and Memory-based Parsing

Author: Hinrichs Erhard W.
Publication venue: The Korean Society for Language and Information
Publication date: 01/01/2002
Field of study

Waseda University Repository

Parsing coordinations

Author: Hinrichs Erhard
Klett Eva
Kübler Sandra
Maier Wolfgang
Publication venue
Publication date: 05/05/2009
Field of study

The present paper is concerned with statistical parsing of constituent structures in German. The paper presents four experiments that aim at improving parsing performance of coordinate structure: 1) reranking the n-best parses of a PCFG parser, 2) enriching the input to a PCFG parser by gold scopes for any conjunct, 3) reranking the parser output for all possible scopes for conjuncts that are permissible with regard to clause structure. Experiment 4 reranks a combination of parses from experiments 1 and 3. The experiments presented show that n- best parsing combined with reranking improves results by a large margin. Providing the parser with different scope possibilities and reranking the resulting parses results in an increase in F-score from 69.76 for the baseline to 74.69. While the F-score is similar to the one of the first experiment (n-best parsing and reranking), the first experiment results in higher recall (75.48% vs. 73.69%) and the third one in higher precision (75.43% vs. 73.26%). Combining the two methods results in the best result with an F-score of 76.69

Hochschulschriftenserver - Universität Frankfurt am Main